2  The whole game

Multiple imputation workflows with propensity score and survival analysis

Author

Janick Weberpals, RPh, PhD

Published

September 11, 2024

2.1 Background

  • In 2022, nearly every third drug approval was granted in the field of oncology (tendency ↑)(Mullard 2022)

  • Decision-makers increasingly rely on real-world evidence (RWE) generated from routine-care health data such as electronic health records (EHR) to evaluate the comparative safety and effectiveness of novel cancer therapies

ENCORE

  • The ENCORE project is an RCT DUPLICATE expansion to oncology which is going to emulate 12 randomized clinical trials using multiple EHR data sources. The process includes an emphasis on transparency with documented assessment of data fitness of the RWD source for each trial and conducting extensive sensitivity analyses to assess robustness of findings and trial eligibility criteria.

  • Partially observed covariates/confounders are a common and pervasive challenge

  • To date, most oncology studies utilizing RWD have relied on complete case analysis although assumptions for a complete case analysis (missing completely at random [MCAR]) are even stronger than those (missing at random [MAR]) for multiple imputation (MI). Besides this, MI has additional advantages:

    • All patients are retained

    • Flexible modeling (parametric, non-parametric)

    • Can incorporate additional information (auxiliary covariates) to make the MAR assumption more likely

    • Realistic variance estimation (Rubin’s rule)

  • However:

    • Not much is known about how to use multiple imputation in combination with propensity score-based approaches

    • Computational implementation can be complex

2.2 Objective

Objective

To establish a computationally reproducible workflow that streamlines multiple imputation > propensity score matching/weighting > survival analysis workflows in a transparent fashion

Figure 2.1: Streamlined workflow to approach partially observed covariate data in oncology trial emulations.

2.3 Leyrat et al. simulation study

One of the most comprehensive and influental simulation studies that addressed the question on how to combine multiple imputation with propensity scores (IPTW weighting) was published in 2019 by Leyrat et al. (Leyrat et al. 2019). In this study, the authors looked at three different potential ways:

  • MIte: MI > PS estimation > Outcome model for each PS model > Pooling of results

  • MIps: MI > PS estimation > PS pooling across datasets > single outcome model

  • MIpar: MI > Pooling of covariate parameters > single PS model > single outcome model

Additional questions that were also addressed:

  • Should outcome be included in imputation model?

  • How to estimate variance of IPTW estimator in context of MIte or MIps or MIpar?

Figure 2.2: Illustration of potential approaches that could be considered after multiple imputation (MI) of the partially observed covariates are missing values on the original dataset.

2.3.1 Simulation study results

  • MIte performed best in terms of bias, standardized differences/balancing, coverage rate and variance estimation

    • MI > PS estimation > Outcome model for each PS model > Pooling of results
  • Standard IPTW variance estimation is valid for MIte

  • Outcome must be included in imputation model

Figure 2.3: Leyrat et al. simulation study results.

2.3.2 Implementation in MatchThem R package

To streamline the implementation of multiple imputation > propensity score workflows, Farhad Pishgar, Noah Greifer, Clémence Leyrat and Elizabeth Stuart developed the MatchThem package (Pishgar et al. 2021) which relies on the functionality provided by the mice, MatchIt, and WeightIt packages. An exemplary illustration on how to use the package in a survival analysis context is given in Chapter 3 (cheatsheet).

2.4 References

Buuren, Stef van, and Karin Groothuis-Oudshoorn. 2011. Mice: Multivariate Imputation by Chained Equations in r” 45: 1–67. https://doi.org/10.18637/jss.v045.i03.
Green, Kerry M., and Elizabeth A. Stuart. 2014. “Examining Moderation Analyses in Propensity Score Methods: Application to Depression and Substance Use.” Journal of Consulting and Clinical Psychology 82 (5): 773–83. https://doi.org/10.1037/a0036515.
Leyrat, Clémence, Shaun R Seaman, Ian R White, Ian Douglas, Liam Smeeth, Joseph Kim, Matthieu Resche-Rigon, James R Carpenter, and Elizabeth J Williamson. 2019. “Propensity Score Analysis with Partially Observed Covariates: How Should Multiple Imputation Be Used?” Statistical Methods in Medical Research 28 (1): 3–19. https://doi.org/10.1177/0962280217713032.
Lumley, Thomas. 2024. “Survey: Analysis of Complex Survey Samples.”
Mullard, Asher. 2022. “2021 FDA Approvals.” Nature Reviews Drug Discovery 21 (2): 83–88. https://doi.org/10.1038/d41573-022-00001-9.
Pishgar, Farhad, Noah Greifer, Clémence Leyrat, and Elizabeth Stuart. 2021. “MatchThem: Matching and Weighting after Multiple Imputation.” R Journal 13 (2): 292–305. https://doi.org/10.32614/RJ-2021-073.
Shah, Anoop D., Jonathan W. Bartlett, James Carpenter, Owen Nicholas, and Harry Hemingway. 2014. “Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study.” American Journal of Epidemiology 179 (6): 764–74. https://doi.org/10.1093/aje/kwt312.
Stekhoven, Daniel J., and Peter Bühlmann. 2012. “MissForestnon-Parametric Missing Value Imputation for Mixed-Type Data.” Bioinformatics 28 (1): 112–18. https://doi.org/10.1093/bioinformatics/btr597.
Therneau, Terry M. 2024. “A Package for Survival Analysis in r.” https://CRAN.R-project.org/package=survival.
Weberpals, Janick, Sudha Raman, Pamela Shaw, Hana Lee, Massimiliano Russo, Bradley Hammill, Sengwee Toh, et al. 2024. “A Principled Approach to Characterize and Analyze Partially Observed Confounder Data from Electronic Health Records.” Clinical Epidemiology Volume 16 (May): 329–43. https://doi.org/10.2147/clep.s436131.